Search CORE

318 research outputs found

Graphle: Interactive exploration of large, dense graphs

Author: Huttenhower Curtis
Mehmood Sajid O
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A wide variety of biological data can be modeled as network structures, including experimental results (e.g. protein-protein interactions), computational predictions (e.g. functional interaction networks), or curated structures (e.g. the Gene Ontology). While several tools exist for visualizing large graphs at a global level or small graphs in detail, previous systems have generally not allowed interactive analysis of dense networks containing thousands of vertices at a level of detail useful for biologists. Investigators often wish to explore specific portions of such networks from a detailed, gene-specific perspective, and balancing this requirement with the networks' large size, complex structure, and rich metadata is a substantial computational challenge. Results Graphle is an online interface to large collections of arbitrary undirected, weighted graphs, each possibly containing tens of thousands of vertices (e.g. genes) and hundreds of millions of edges (e.g. interactions). These are stored on a centralized server and accessed efficiently through an interactive Java applet. The Graphle applet allows a user to examine specific portions of a graph, retrieving the relevant neighborhood around a set of query vertices (genes). This neighborhood can then be refined and modified interactively, and the results can be saved either as publication-quality images or as raw data for further analysis. The Graphle web site currently includes several hundred biological networks representing predicted functional relationships from three heterogeneous data integration systems: <it>S. cerevisiae </it>data from bioPIXIE, <it>E. coli </it>data using MEFIT, and <it>H. sapiens </it>data from HEFalMp. Conclusions Graphle serves as a search and visualization engine for biological networks, which can be managed locally (simplifying collaborative data sharing) and investigated remotely. The Graphle framework is freely downloadable and easily installed on new servers, allowing any lab to quickly set up a Graphle site from which their own biological network data can be shared online.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

PILGRM: an interactive data-driven discovery platform for expert biologists

Author: Avraham
Bolstad
C. S. Greene
Calviello
Chikina
Dai
Faith
Gautier
Gentleman
Giaever
Harsha
Hess
Hibbs
Irizarry
Nielsen
O. G. Troyanskaya
Santos
Tedder
Yan
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

PILGRM (the platform for interactive learning by genomics results mining) puts advanced supervised analysis techniques applied to enormous gene expression compendia into the hands of bench biologists. This flexible system empowers its users to answer diverse biological questions that are often outside of the scope of common databases in a data-driven manner. This capability allows domain experts to quickly and easily generate hypotheses about biological processes, tissues or diseases of interest. Specifically PILGRM helps biologists generate these hypotheses by analyzing the expression levels of known relevant genes in large compendia of microarray data. Because PILGRM is data-driven, it complements a user’s knowledge and literature analysis with mining of diverse functional genomic data, thereby generating novel predictions that can drive experimental follow-up. This server is free, does not require registration and is available for use at http://pilgrm.princeton.edu

Princeton University Open Access Repository

Crossref

PubMed Central

MADAM - An open source meta-analysis toolbox for R and Bioconductor

Author: Armin Graber
DR Rhodes
F Hong
I Borozan
JD Storey
JK Choi
Karl G Kugler
Laurin AJ Mueller
O Troyanskaya
RA Fisher
RC Gentleman
VG Tusher
Y Moreau
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Author: Coller Hilary A
Flamholz Avi I
Hibbs Matthew A
Huttenhower Curtis
Landis Jessica N
Myers Chad L
Olszewski Kellen L
Sahi Sauhard
Siemers Nathan O
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p

The Jackson Laboratory: The Mouseion at the JAXlibrary

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia

Author: FJ Martinez-Murcia
FJ Martinez-Murcia
FJ Martínez-Murcia
JD Olden
L Kimppa
M Bartlett
O Troyanskaya
PA Thompson
PG Spetsieris
R Peterson
SE Shaywitz
U Goswami
U Goswami
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/10/2020
Field of study

Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that affect phoneme processing, and hence these symptoms may be identifiable before reading ability is acquired, allowing for early intervention. In this paper we propose a new methodology to assess the risk of DD before students learn to read. For this purpose, we propose a mixed neural model that calculates risk levels of dyslexia from tests that can be completed at the age of 5 years. Our method first trains an auto-encoder, and then combines the trained encoder with an optimized ordinal regression neural network devised to ensure consistency of predictions. Our experiments show that the system is able to detect unaffected subjects two years before it can assess the risk of DD based mainly on phonological processing, giving a specificity of 0.969 and a correct rate of more than 0.92. In addition, the trained encoder can be used to transform test results into an interpretable subject spatial distribution that facilitates risk assessment and validates methodology.Comment: 12 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields

Author: Albertson
Albertson
Beitzinger
Brown
Chin
E. M. Airoldi
Han
Heim
Huang
Jonsson
Nag
O. G. Troyanskaya
R. E. Schapire
Rocke
Rueda
Shah
Snijders
V. Dumeaux
van Beers
Wessels
Yi
Z. Barutcuoglu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome

Crossref

Harvard University - DASH

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Missing values: sparse inverse covariance estimation and an extension to sparse regression

Author: A. Dempster
A. Rothman
A. Wille
C. Wu
D.M. Witten
G.D. Murray
J. Friedman
J. Friedman
J. Friedman
J.G. Ibrahim
J.L. Schafer
M. Yuan
N. Meinshausen
N. Meinshausen
N. Städler
Nicolas Städler
O. Banerjee
O. Troyanskaya
P. Tseng
Peter Bühlmann
R. Tibshirani
R.J.A. Little
S. Buck
S. Lauritzen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We propose an l1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data.Comment: The final publication is available at http://www.springerlink.co

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Nucleosome-coupled expression differences in closely-related species

Author: AL Olins
AM Tsankov
BE Bernstein
C Koch
Corey Nislow
CT Harbison
DE Schones
E Segal
EA Sekinger
F Ozsolak
G Badis
G Zhu
GC Yuan
GJ Hogan
H Li
I Tirosh
IP Ioshikhes
JD Hughes
KA Zawadzki
Kyle Tsui
L Bai
Maitreya J Dunham
Marinella Gebbia
N Kaplan
N Morohashi
O Elemento
O Troyanskaya
OC Martin
Olga G Troyanskaya
P Cliften
P Clifton
RD Kornberg
S Mahony
S Shivaswamy
S Washietl
SW Doniger
T Owen-Hughes
T Pramila
Victoria Yao
W Lee
X Liu
Y Guan
Y Zhang
Yuanfang Guan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genome-wide nucleosome occupancy is negatively related to the average level of transcription factor motif binding based on studies in yeast and several other model organisms. The degree to which nucleosome-motif interactions relate to phenotypic changes across species is, however, unknown. Results We address this challenge by generating nucleosome positioning and cell cycle expression data for <it>Saccharomyces bayanus </it>and show that differences in nucleosome occupancy reflect cell cycle expression divergence between two yeast species, <it>S. bayanus </it>and <it>S. cerevisiae</it>. Specifically, genes with nucleosome-depleted MBP1 motifs upstream of their coding sequence show periodic expression during the cell cycle, whereas genes with nucleosome-shielded motifs do not. In addition, conserved cell cycle regulatory motifs across these two species are more nucleosome-depleted compared to those that are not conserved, suggesting that the degree of conservation of regulatory sites varies, and is reflected by nucleosome occupancy patterns. Finally, many changes in cell cycle gene expression patterns across species can be correlated to changes in nucleosome occupancy on motifs (rather than to the presence or absence of motifs). Conclusions Our observations suggest that alteration of nucleosome occupancy is a previously uncharacterized feature related to the divergence of cell cycle expression between species.</p

University of Toronto Research Repository

Princeton University Open Access Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Moore-Penrose Pseudoinverse. A Tutorial Review of the Theory

Author: AN Tikhonov
C Hennig
CW Groetsch
DK Sodickson
EH Moore
G Lohmann
H Ammari
H Andrews
J-Z Wang
João Carlos Alves Barata
L Bedini
Mahir Saleh Hussein
MT Page
NG Gençer
O Bratteli
O Troyanskaya
R Penrose
R Penrose
R Walle Van De
RD Pascual-Marqui
S Atzori
TV Ricci
X Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2011
Field of study

In the last decades the Moore-Penrose pseudoinverse has found a wide range of applications in many areas of Science and became a useful tool for physicists dealing, for instance, with optimization problems, with data analysis, with the solution of linear integral equations, etc. The existence of such applications alone should attract the interest of students and researchers in the Moore-Penrose pseudoinverse and in related sub jects, like the singular values decomposition theorem for matrices. In this note we present a tutorial review of the theory of the Moore-Penrose pseudoinverse. We present the first definitions and some motivations and, after obtaining some basic results, we center our discussion on the Spectral Theorem and present an algorithmically simple expression for the computation of the Moore-Penrose pseudoinverse of a given matrix. We do not claim originality of the results. We rather intend to present a complete and self-contained tutorial review, useful for those more devoted to applications, for those more theoretically oriented and for those who already have some working knowledge of the sub ject.Comment: 23 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ethnic Variation in Inflammatory Profile in Tuberculosis

Author: A Dominguez-Castellano
Adrian R. Martineau
AE Hirsh
AK Coussens
Alleyna P. Claxton
AM Cooper
Anna K. Coussens
AR Martineau
AR Martineau
C Manca
C Winkler
Charles A. Mein
Christopher J. Griffiths
CM Stein
D Modiano
D Portevin
E Wheeler
EG Hoal-van Helden
Francis A. Drobniewski
Geoffrey E. Packe
Graham H. Bothamley
Heather J. Milburn
JC Chambers
JF Djoba Siawaya
JF Djoba Siawaya
Kamrul Islam
L Baker
Leena Bhaw-Rosun
Lucy V. Baker
M Pareek
M Speeckaert
M Tanveer
M Weiner
Mathina Darmalingam
MB Reed
NA Rosenberg
O Mwantembe
O Troyanskaya
P McCullagh
Paul T. Elkington
Peter M. Timms
RF Chun
Richard D. Barker
Robert J. Wilkinson
Robert N. Davidson
Rosamond A. Nuamah
S Brahmbhatt
S Gagneux
S Gagneux
SA Theus
SM Newton
T Brown
Thomas R. Hawn
Vladyslav Nikolayevskyy
W Fox
WR Mac Kenzie
WX Feng
Y Benjamini
Yasmeen Hanifa
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

PMCID: PMC3701709This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Public Library of Science (PLOS)

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Queen Mary Research Online

King's Research Portal

FigShare